LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction for Dense Databases

نویسندگان

  • Zhenglu Yang
  • Yitong Wang
  • Masaru Kitsuregawa
چکیده

Sequential pattern mining is very important because it is the basis of many applications. Although there has been a great deal of effort on sequential pattern mining in recent years, its performance is still far from satisfactory because of two main challenges: large search spaces and the ineffectiveness in handling dense datasets. To offer a solution to the above challenges, we have proposed a series of novel algorithms, called the LAst Position INduction (LAPIN) sequential pattern mining, which is based on the simple idea that the last position of an item, α, is the key to judging whether or not a frequent k-length sequential pattern can be extended to be a frequent (k+1)-length pattern by appending the item α to it. LAPIN can largely reduce the search space during the mining process, and is very effective in mining dense datasets. Our performance study demonstrates that LAPIN outperforms PrefixSpan [4] by up to an order of magnitude on long pattern dense datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Mining Sequential Pattern by Last Position Induction

Sequence pattern mining is an important research problem because it is the basis of many other applications. Yet how to efficiently implement the mining is difficult due to the inherent characteristic of the problem the large size of the data set. In this paper, by combining SPAM, we propose a new algorithm called LAst Position INduction Sequential PAttern Mining (abbreviated as LAPIN-SPAM), wh...

متن کامل

An Effective System for Mining Web Log

The WWW provides a simple yet effective media for users to search, browse, and retrieve information in the Web. Web log mining is a promising tool to study user behaviors, which could further benefit web-site designers with better organization and services. Although there are many existing systems that can be used to analyze the traversal path of web-site visitors, their performance is still fa...

متن کامل

Mining Sequential Patterns in Dense Databases

Sequential pattern mining is an important data mining problem with broad applications, including the analysis of customer purchase patterns, Web access patterns, DNA analysis, and so on. We show on dense databases, a typical algorithm like Spade algorithm tends to lose its efficiency. Spade is based on the used of lists containing the localization of the occurrences of pattern in the sequences ...

متن کامل

Updating the Built Prelarge Fast Updated Sequential Pattern Trees with Sequence Modification

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Association rule mining in spatial databases and temporal databases have been studied extensively in data mining research. Most of the research studies have found interesting patterns in either spatial information or temporal information, however, few studie...

متن کامل

Abstract—Mining Sequential Patterns in large databases has become

Mining Sequential Patterns in large databases has become an important data mining task with broad applications. It is an important task in data mining field, which describes potential sequenced relationships among items in a database. There are many different algorithms introduced for this task. Conventional algorithms can find the exact optimal Sequential Pattern rule but it takes a long time,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007